construction process
AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI
Lim, Chae-Gyun, Han, Seung-Ho, Byun, EunYoung, Han, Jeongyun, Cho, Soohyun, Joo, Eojin, Kim, Heehyeon, Kim, Sieun, Lee, Juhoon, Lee, Hyunsoo, Lee, Dongkun, Hyeon, Jonghwan, Hwang, Yechan, Lee, Young-Jun, Lee, Kyeongryul, An, Minhyeong, Ahn, Hyunjun, Son, Jeongwoo, Park, Junho, Yoon, Donggyu, Kim, Taehyung, Kim, Jeemin, Choi, Dasom, Lee, Kwangyoung, Lim, Hyunseung, Jung, Yeohyun, Hong, Jongok, Nam, Sooyohn, Park, Joonyoung, Na, Sungmin, Choi, Yubin, Choi, Jeanne, Hong, Yoojin, Jang, Sueun, Seo, Youngseok, Park, Somin, Jo, Seoungung, Chae, Wonhye, Jo, Yeeun, Kim, Eunyoung, Whang, Joyce Jiyoung, Hong, HwaJung, Seering, Joseph, Lee, Uichin, Kim, Juho, Choi, Sunna, Ko, Seokyeon, Kim, Taeho, Kim, Kyunghoon, Ha, Myungsik, Lee, So Jung, Hwang, Jemin, Kwak, JoonHo, Choi, Ho-Jin
The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-English, socio-cultural contexts such as Korean, and are often limited to the text modality. To address this gap, we introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI. First, we define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks by a multidisciplinary expert group to cover both universal harms and relevance to the Korean socio-cultural context. Second, leveraging this taxonomy, we construct and release AssurAI, a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio. Third, we apply the rigorous quality control process used to ensure data integrity, featuring a two-phase construction (i.e., expert-led seeding and crowdsourced scaling), triple independent annotation, and an iterative expert red-teaming loop. Our pilot study validates AssurAI's effectiveness in assessing the safety of recent LLMs. We release AssurAI to the public to facilitate the development of safer and more reliable generative AI systems for the Korean community.
Efficient Quantum Approximate $k$NN Algorithm via Granular-Ball Computing
Xia, Shuyin, Tian, Xiaojiang, Yuan, Suzhen, Deng, Jeremiah D.
High time complexity is one of the biggest challenges faced by $k$-Nearest Neighbors ($k$NN). Although current classical and quantum $k$NN algorithms have made some improvements, they still have a speed bottleneck when facing large amounts of data. To address this issue, we propose an innovative algorithm called Granular-Ball based Quantum $k$NN(GB-Q$k$NN). This approach achieves higher efficiency by first employing granular-balls, which reduces the data size needed to processed. The search process is then accelerated by adopting a Hierarchical Navigable Small World (HNSW) method. Moreover, we optimize the time-consuming steps, such as distance calculation, of the HNSW via quantization, further reducing the time complexity of the construct and search process. By combining the use of granular-balls and quantization of the HNSW method, our approach manages to take advantage of these treatments and significantly reduces the time complexity of the $k$NN-like algorithms, as revealed by a comprehensive complexity analysis.
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
Chen, Haibin, Lv, Kangtao, Hu, Chengwei, Li, Yanshi, Yuan, Yujin, He, Yancheng, Zhang, Xingyao, Liu, Langming, Liu, Shilei, Su, Wenbo, Zheng, Bo
With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges: (1) handle the heterogeneous and diverse nature of tasks, (2) distinguish between generality and specificity within the e-commerce field. To address these problems, we propose \textbf{ChineseEcomQA}, a scalable question-answering benchmark focused on fundamental e-commerce concepts. ChineseEcomQA is built on three core characteristics: \textbf{Focus on Fundamental Concept}, \textbf{E-commerce Generality} and \textbf{E-commerce Expertise}. Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks, thus addressing the challenge of heterogeneity and diversity. Additionally, by carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts, allowing for precise validation of domain capabilities. We achieve this through a scalable benchmark construction process that combines LLM validation, Retrieval-Augmented Generation (RAG) validation, and rigorous manual annotation. Based on ChineseEcomQA, we conduct extensive evaluations on mainstream LLMs and provide some valuable insights. We hope that ChineseEcomQA could guide future domain-specific evaluations, and facilitate broader LLM adoption in e-commerce applications.
Adaptive Visual Perception for Robotic Construction Process: A Multi-Robot Coordination Framework
Xu, Jia, Dixit, Manish, Wang, Xi
Construction robots operate in unstructured construction sites, where effective visual perception is crucial for ensuring safe and seamless operations. However, construction robots often handle large elements and perform tasks across expansive areas, resulting in occluded views from onboard cameras and necessitating the use of multiple environmental cameras to capture the large task space. This study proposes a multi-robot coordination framework in which a team of supervising robots equipped with cameras adaptively adjust their poses to visually perceive the operation of the primary construction robot and its surrounding environment. A viewpoint selection method is proposed to determine each supervising robot's camera viewpoint, optimizing visual coverage and proximity while considering the visibility of the upcoming construction robot operation. A case study on prefabricated wooden frame installation demonstrates the system's feasibility, and further experiments are conducted to validate the performance and robustness of the proposed viewpoint selection method across various settings. This research advances visual perception of robotic construction processes and paves the way for integrating computer vision techniques to enable real-time adaption and responsiveness. Such advancements contribute to the safe and efficient operation of construction robots in inherently unstructured construction sites.
Streamlining the Action Dependency Graph Framework: Two Key Enhancements
Multi Agent Path Finding (MAPF) is critical for coordinating multiple robots in shared environments, yet robust execution of generated plans remains challenging due to operational uncertainties. The Action Dependency Graph (ADG) framework offers a way to ensure correct action execution by establishing precedence-based dependencies between wait and move actions retrieved from a MAPF planning result. The original construction algorithm is not only inefficient, with a quadratic worst-case time complexity it also results in a network with many redundant dependencies between actions. This paper introduces two key improvements to the ADG framework. First, we prove that wait actions are generally redundant and show that removing them can lead to faster overall plan execution on real robot systems. Second, we propose an optimized ADG construction algorithm, termed Sparse Candidate Partitioning (SCP), which skips unnecessary dependencies and lowers the time complexity to quasi-linear, thereby significantly improving construction speed.
Customized Information and Domain-centric Knowledge Graph Construction with Large Language Models
Wawrzik, Frank, Plaue, Matthias, Vekariya, Savan, Grimm, Christoph
In this paper we propose a novel approach based on knowledge graphs to provide timely access to structured information, to enable actionable technology intelligence, and improve cyber-physical systems planning. Our framework encompasses a text mining process, which includes information retrieval, keyphrase extraction, semantic network creation, and topic map visualization. Following this data exploration process, we employ a selective knowledge graph construction (KGC) approach supported by an electronics and innovation ontology-backed pipeline for multi-objective decision-making with a focus on cyber-physical systems. We apply our methodology to the domain of automotive electrical systems to demonstrate the approach, which is scalable. Our results demonstrate that our construction process outperforms GraphGPT as well as our bi-LSTM and transformer REBEL with a pre-defined dataset by several times in terms of class recognition, relationship construction and correct "sublass of" categorization. Additionally, we outline reasoning applications and provide a comparison with Wikidata to show the differences and advantages of the approach.
Machine Learning and Theory Ladenness -- A Phenomenological Account
Termine, Alberto, Ratti, Emanuele, Facchini, Alessandro
In recent years, the dissemination of machine learning (ML) methodologies in scientific research has prompted discussions on theory ladenness. More specifically, the issue of theory ladenness has remerged as questions about whether and how ML models (MLMs) and ML modelling strategies are impacted by the domain theory of the scientific field in which ML is used and implemented (e.g., physics, chemistry, biology, etc). On the one hand, some have argued that there is no difference between traditional (pre ML) and ML assisted science. In both cases, theory plays an essential and unavoidable role in the analysis of phenomena and the construction and use of models. Others have argued instead that ML methodologies and models are theory independent and, in some cases, even theory free. In this article, we argue that both positions are overly simplistic and do not advance our understanding of the interplay between ML methods and domain theories. Specifically, we provide an analysis of theory ladenness in ML assisted science. Our analysis reveals that, while the construction of MLMs can be relatively independent of domain theory, the practical implementation and interpretation of these models within a given specific domain still relies on fundamental theoretical assumptions and background knowledge.
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Weber, Alexander Arno, Thellmann, Klaudia, Ebert, Jan, Flores-Herr, Nicolas, Lehmann, Jens, Fromm, Michael, Ali, Mehdi
The adaption of multilingual pre-trained Large Language Models (LLMs) into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of multilingual models on parallel, multi-turn instruction-tuning benchmarks across a selection of the most-spoken Indo-European languages. We systematically examine the effects of language and instruction dataset size on a mid-sized, multilingual LLM by instruction-tuning it on parallel instruction-tuning datasets. Our results demonstrate that instruction-tuning on parallel instead of monolingual corpora benefits cross-lingual instruction following capabilities by up to 4.6%. Furthermore, we show that the Superficial Alignment Hypothesis does not hold in general, as the investigated multilingual 7B parameter model presents a counter-example requiring large-scale instruction-tuning datasets. Finally, we conduct a human annotation study to understand the alignment between human-based and GPT-4-based evaluation within multilingual chat scenarios.